Open Mind Animals: Insuring the quality of data openly contributed over the World Wide Web

نویسندگان

  • David G. Stork
  • Chuck P. Lam
چکیده

We describe the Open Mind Initiative, a framework for building intelligent systems collaboratively over the internet, and focus on one of its simpler component projects, Open Mind Animals. The Initiative extends traditional open source development methods by allowing non-expert netizens to contribute informal data over the internet. Such data is used to train classifiers or guide automatic inference systems, and thus it is important that only data of high accuracy and consistency be accepted. We identify a number of possible sources of poor data in Animals — several of which are generic and applicable to a range of open data collection projects — and implement a system of software modules for automatically and semi-automatically preventing poor data from being accepted. Our system, tested in a controlled laboratory intranet, filters faulty data through a variety of mechanisms and leads to accurate decision tree classifiers. Our reusable modules can be employed in our planned large-scale internet deployment of Animals and other Open Mind projects.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Architecture Supporting the Collection and Monitoring of Data Openly Contributed over the World Wide Web

Open data collection over the World Wide Web — in which any web user can contribute to large databases of “informal” data — presents several challenges that require novel approaches in human interface design, algorithmic machine learning and collaborative infrastructure. Foremost among these challenges is the need to ensure data integrity and quality, by automatically or semiautomatically ident...

متن کامل

Commonsense Knowledge Mining from the Web

Good and generous knowledge sources, reliable and efficient induction patterns, and automatic and controllable quality assertion approaches are three critical issues to commonsense knowledge (CSK) acquisition. This paper employs Open Mind Common Sense (OMCS), a volunteerscontributed CSK database, to study the first and the third issues. For those stylized CSK, our result shows that over 40% of ...

متن کامل

Open Mind Word Expert: Creating Large Annotated Data Collections with Web Users' Help

Open Mind Word Expert is an implemented active learning system that aims to create large annotated corpora by tapping into the world’s vast pool of knowledge. It does this by relying on the vast number of Web users who contribute their knowledge to data annotation. Open Mind Word Expert focuses on building semantically annotated corpora, by collecting word sense tagging from the general public ...

متن کامل

Toward a Computational Theory of Data Acquisition and Truthing

The creation of a pattern classifier requires choosing or creating a model, collecting training data and verifying or “truthing” this data, and then training and testing the classifier. In practice, individual steps in this sequence must be repeated a number of times before the classifier achieves acceptable performance. The majority of the research in computational learning theory addresses th...

متن کامل

Building a Sense Tagged Corpus with Open Mind Word Expert

Open Mind Word Expert is an implemented active learning system for collecting word sense tagging from the general public over the Web. It is available at http://teach-computers.org. We expect the system to yield a large volume of high-quality training data at a much lower cost than the traditional method of hiring lexicographers. We thus propose a Senseval-3 lexical sample activity where the tr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000